Search CORE

31 research outputs found

Brook GLES Pi: democratising accelerator programming

Author: Bakhoda Ali
Bellard Fabrice
Leskela Jyrki
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/08/2018
Field of study

Nowadays computing is heavily-based on accelerators, however, the cost of the hardware equipment prevents equal access to heterogeneous programming. In this work we present Brook GLES Pi, a port of the accelerator programming language Brook. Our solution, primarily focused on the educational platform Raspberry Pi, allows to teach, experiment and take advantage of heterogeneous programming on any low-cost embedded device featuring an OpenGL ES 2 GPU, democratising access to accelerator programming.This work has been partially supported by the Spanish Ministry of Science and Innovation under grant TIN2015-65316-P and the HiPEAC Network of Excellence.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

BaseSAFE: Baseband SAnitized Fuzzing through Emulation

Author: Bellard Fabrice
Falk Brandon
GPP.
GPP.
GPP.
GPP.
GPP.
GPP.
Hay Roee
Maier Dominik
Mulliner Collin
Mulliner Collin
Ngyuen Anh Quynh
Park Shinjo
Rupprecht David
Rupprecht David
Schumilo Sergej
Song Dokyung
Tian Dave Jing
Weinmann Ralf-Philipp
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 15/05/2020
Field of study

Rogue base stations are an effective attack vector. Cellular basebands represent a critical part of the smartphone's security: they parse large amounts of data even before authentication. They can, therefore, grant an attacker a very stealthy way to gather information about calls placed and even to escalate to the main operating system, over-the-air. In this paper, we discuss a novel cellular fuzzing framework that aims to help security researchers find critical bugs in cellular basebands and similar embedded systems. BaseSAFE allows partial rehosting of cellular basebands for fast instrumented fuzzing off-device, even for closed-source firmware blobs. BaseSAFE's sanitizing drop-in allocator, enables spotting heap-based buffer-overflows quickly. Using our proof-of-concept harness, we fuzzed various parsers of the Nucleus RTOS-based MediaTek cellular baseband that are accessible from rogue base stations. The emulator instrumentation is highly optimized, reaching hundreds of executions per second on each core for our complex test case, around 15k test-cases per second in total. Furthermore, we discuss attack vectors for baseband modems. To the best of our knowledge, this is the first use of emulation-based fuzzing for security testing of commercial cellular basebands. Most of the tooling and approaches of BaseSAFE are also applicable for other low-level kernels and firmware. Using BaseSAFE, we were able to find memory corruptions including heap out-of-bounds writes using our proof-of-concept fuzzing harness in the MediaTek cellular baseband. BaseSAFE, the harness, and a large collection of LTE signaling message test cases will be released open-source upon publication of this paper

arXiv.org e-Print Archive

Crossref

OS Support for Thread Migration and Distribution in the Fully Heterogeneous Datacenter

Author: Barbalace Antonio
Bellard Fabrice
Sherwood Timothy
Zaharia Matei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/05/2017
Field of study

Crossref

The University of Manchester - Institutional Repository

Fast and portable vector dsp simulation through automatic vectorization

Author: Bellard Fabrice
Intel Corporation
Michel L.
Redford J.
Publication venue: Association for Computing Machinery, Inc
Publication date: 01/01/2018
Field of study

\u3cp\u3eVector DSPs are quite common in embedded SoCs used in compute-intensive domains such as imaging and wireless communication. To achieve short time-to-market, it is crucial to provide system architects and SW developers with fast and accurate instruction set simulators of such DSPs. To this end, a methodology for accelerating the simulation of vector instructions in vector DSPs is proposed. The acceleration is achieved by enabling automatic translation of the vector instructions in a given vector DSP binary into host SIMD instructions. The key advantage of the proposed methodology is its independence from the host architecture. Empirical evaluation, using a set of commercial vector DSPs, shows that the proposed methodology provides a 4x average reduction in simulation time of a vector instruction and a 2x average reduction in simulation time of a whole application.\u3c/p\u3

Repository TU/e

Crossref

Pure OAI Repository

Harissa: A flexible and efficient Java environment mixing bytecode and compiled code

Author: Barbara Moura
Charles Consel
Fabrice Bellard
Gilles Muller
Publication venue
Publication date
Field of study

The Java language provides a promising solution to the design of safe programs, with an application spectrum ranging from Web services to operating system components. The well-known tradeoff of Java's portability is the inefficiency of its basic execution model, which relies on the interpretation of an object-based virtual machine. Many solutions have been proposed to overcome this problem, such as just-in-time (JIT) and off-line bytecode compilers. However, most compilers trade efficiency for either portability or the ability to dynamically load bytecode. In this paper, we present an approach which reconciles portability and efficiency, and preserves the ability to dynamically load bytecode. We have designed and implemented an efficient environment for the execution of Java programs, named Harissa. Harissa permits the mixing of compiled and interpreted methods. Harissa's compiler translates Java bytecode to C, incorporating aggressive optimizations such as virtual method call optimization based on the Clas

CiteSeerX

Arbitrary and Variable Precision Floating-Point Arithmetic Support in Dynamic Binary Translation

Author: Bellard Fabrice
Emilio
Fleischer Bruce
Jaiswal Manish Kumar
Tiwari Sugandha
Publication venue: ACM IEEE
Publication date: 18/01/2021
Field of study

International audienceFloating-point hardware support has more or less been settled 35 years ago by the adoption of the IEEE 754 standard. However, many scientific applications require higher accuracy than what can be represented on 64 bits, and to that end make use of dedicated arbitrary precision software libraries. To reach a good performance/accuracy trade-off, developers use variable precision, requiring e.g. more accuracy as the computation progresses. Hardware accelerators for this kind of computations do not exist yet, and independently of the actual quality of the underlying arithmetic computations, defining the right instruction set architecture, memory representations, etc, for them is a challenging task. We investigate in this paper the support for arbitrary and variable precision arithmetic in a dynamic binary translator, to help gain an insight of what such an accelerator could provide as an interface to compilers, and thus programmers. We detail our design and present an implementation in QEMU using the MPRF library for the RISCV processo

Crossref

Hal - Université Grenoble Alpes

HAL Descartes

Improving Remote Desktopping Through Adaptive Record/Replay

Author: Bellard Fabrice
Marin Lopez P.T.A.
Sandberg Russel
Xu Min
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

A fast analytical model of fully associative caches

Author: Bellard Fabrice
Beyls Kristof
Eklov David
Pouchet Louis-Noël
Terpstra Dan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/06/2019
Field of study

While the cost of computation is an easy to understand local property, the cost of data movement on cached architectures depends on global state, does not compose, and is hard to predict. As a result, programmers often fail to consider the cost of data movement. Existing cache models and simulators provide the missing information but are computationally expensive. We present a lightweight cache model for fully associative caches with least recently used (LRU) replacement policy that gives fast and accurate results. We count the cache misses without explicit enumeration of all memory accesses by using symbolic counting techniques twice: 1) to derive the stack distance for each memory access and 2) to count the memory accesses with stack distance larger than the cache size. While this technique seems infeasible in theory, due to non-linearities after the first round of counting, we show that the counting problems are sufficiently linear in practice. Our cache model often computes the results within seconds and contrary to simulation the execution time is mostly problem size independent. Our evaluation measures modeling errors below 0.6% on real hardware. By providing accurate data placement information we enable memory hierarchy aware software development.Comment: 14 pages, 16 figures, PLDI1

arXiv.org e-Print Archive

Repository for Publications and Research Data

Crossref